Media Tone Analysis: ABC News Coverage of U.S. Elections
Author
Zixu (Michael) Hao
1 Introduction
This analysis examines ABC News coverage patterns across multiple U.S. election cycles, focusing on tone changes and thematic shifts. Using GDELT’s Global Knowledge Graph data, we analyze:
How media tone fluctuates before and after elections
Which themes dominate coverage during electoral periods
How thematic focus shifts from pre- to post-election periods
Long-term trends in news sentiment across years of political coverage
1.1 Data Overview
The dataset contains ABC News coverage from GDELT’s database, including articles from five election cycles: - 2016 Presidential Election - 2018 Midterm Elections - 2020 Presidential Election - 2022 Midterm Elections - 2024 Presidential Election
2 Data Processing and Preparation
2.1 Data Import and Initial Cleaning
Code
import pandas as pdimport globimport matplotlib.pyplot as pltimport numpy as npimport seaborn as snsfrom collections import Counterfrom scipy.stats import ttest_indimport matplotlib.dates as mdates# Set consistent styling for all plotsplt.style.use('seaborn-v0_8-whitegrid')plt.rcParams['font.family'] ='sans-serif'plt.rcParams['font.sans-serif'] = ['Arial', 'DejaVu Sans', 'Liberation Sans']# Load all fox CSV filescsv_files = glob.glob("../data/abc/abc*.csv")df = pd.concat([pd.read_csv(file) forfilein csv_files], ignore_index=True)# Select relevant columnscolumns_of_interest = ["parsed_date", "url", "headline_from_url","V2Themes", "V2Locations", "V2Persons","V2Organizations", "V2Tone"]df = df[columns_of_interest]# Convert parsed_date to datetime and ensure it's timezone-naivedf["parsed_date"] = pd.to_datetime(df["parsed_date"], errors="coerce").dt.tz_localize(None)# Preview structure and missing valuesprint("DataFrame structure:")df.info()print("\nMissing values count:")print(df.isnull().sum())print("\nSample data:")print(df.sample(5))
GDELT’s V2Tone field contains three comma-separated values: 1. Overall tone score (ranges from -10 to +10) 2. Positive tone component 3. Negative tone component
We extract these components for our analysis:
Code
# Split V2Tone into tone, positive_score, and negative_scoretone_split = df["V2Tone"].str.split(",", expand=True)df["tone"] = pd.to_numeric(tone_split[0], errors="coerce")df["positive_score"] = pd.to_numeric(tone_split[1], errors="coerce")df["negative_score"] = pd.to_numeric(tone_split[2], errors="coerce")# Descriptive statistics for tone componentstone_stats = pd.DataFrame({"Tone": df["tone"].describe(),"Positive Score": df["positive_score"].describe(),"Negative Score": df["negative_score"].describe()})print("Tone metrics descriptive statistics:")print(tone_stats)# Create a histogram of tone distributionplt.figure(figsize=(10, 6))plt.hist(df["tone"].dropna(), bins=30, alpha=0.7, color='steelblue')plt.axvline(df["tone"].mean(), color='red', linestyle='dashed', linewidth=1, label=f'Mean: {df["tone"].mean():.2f}')plt.axvline(0, color='black', linestyle='solid', linewidth=1, label='Neutral Tone')plt.title("Distribution of ABC News Tone Scores", fontsize=14, fontweight='bold')plt.xlabel("Tone Score")plt.ylabel("Frequency")plt.legend()plt.grid(True, alpha=0.3)plt.tight_layout()plt.show()
Tone metrics descriptive statistics:
Tone Positive Score Negative Score
count 121973.000000 121973.000000 121973.000000
mean -3.094822 2.232947 5.327769
std 4.037294 1.601264 3.239981
min -47.368421 0.000000 0.000000
25% -5.454545 1.154734 3.000000
50% -2.849389 2.005731 4.918033
75% -0.452489 2.991453 7.124682
max 23.809524 23.809524 47.368421
Note: GDELT tone scores typically range from -10 (extremely negative) to +10 (extremely positive), but most news content falls between -5 and +1. ABC News coverage has a mean tone around -2.7, reflecting the generally negative tone common in news media.
2.3 Define Key Election Dates
Code
# Define key U.S. elections and COVID emergenceelection_events = {"2016 Presidential": "2016-11-08","2018 Midterms": "2018-11-06","2020 Presidential": "2020-11-03","2022 Midterms": "2022-11-08","2024 Presidential": "2024-11-05","COVID": "2020-03-10"}event_dates = {label: pd.to_datetime(date) for label, date in election_events.items()}# Create a dictionary without COVID for analyses that only need election dateselection_dates = {k: v for k, v in event_dates.items() if k !="COVID"}
2.4 Theme Name Mapping
GDELT uses technical theme codes that we convert to more readable names:
Key Finding: All five elections showed a positive tone shift in the three months following the election compared to the three months before. This suggests a consistent pattern where post-election coverage tends to be less negative than pre-election coverage.
3.3.1 Statistical Significance Testing
Code
# Perform t-tests for statistical significancesignificance_results = []for label, date in election_dates.items(): pre = df[(df["parsed_date"] >= date - pd.DateOffset(months=3)) & (df["parsed_date"] < date)]["tone"].dropna() post = df[(df["parsed_date"] >= date) & (df["parsed_date"] < date + pd.DateOffset(months=3))]["tone"].dropna() t_stat, p_val = ttest_ind(post, pre, equal_var=False) significance_results.append({"Election": label,"t-statistic": round(t_stat, 4),"p-value": round(p_val, 4),"Significant": "Yes"if p_val <0.05else"No" })# Convert to DataFrame for cleaner displaysig_df = pd.DataFrame(significance_results)print("Statistical significance of tone shifts (t-test):")print(sig_df)
Interpretation: A p-value < 0.05 indicates the tone shift is statistically significant (not due to random chance). The t-statistic magnitude shows the strength of the difference, with higher absolute values indicating stronger effects.
4 Theme Analysis
4.1 Overall Theme Distribution
Code
# Drop missing themes and split by semicolonthemes_series = df["V2Themes"].dropna().str.split(";")# Flatten the list of all theme entriesall_themes = [theme.split(",")[0] for sublist in themes_series for theme in sublist if theme]# Count the most frequent themestheme_counts = Counter(all_themes).most_common(20)# Map to friendly namesfriendly_counts = [(theme_name_mapping.get(theme, theme), count) for theme, count in theme_counts]# Create a visually appealing bar charttheme_df = pd.DataFrame(friendly_counts, columns=['Theme', 'Count'])theme_df = theme_df.sort_values('Count', ascending=False)plt.figure(figsize=(12, 8))bars = plt.barh(theme_df['Theme'], theme_df['Count'], color=plt.cm.viridis(np.linspace(0, 0.8, len(theme_df))))# Add count labelsfor bar in bars: width = bar.get_width() plt.text(width + (width *0.01), bar.get_y() + bar.get_height()/2, f'{width:,.0f}', ha='left', va='center', fontsize=10, fontweight='bold', color='dimgrey')plt.title("Top 20 Themes in ABC News Coverage", fontsize=16, fontweight='bold')plt.xlabel('Frequency', fontsize=12)plt.grid(axis='x', linestyle='--', alpha=0.7, color='lightgrey')plt.gca().spines['right'].set_visible(False)plt.gca().spines['top'].set_visible(False)plt.tight_layout()plt.show()
The visualization shows ABC News’ dominant themes across the full dataset period:
Political Focus: Presidential coverage, leadership, and general politics dominate
Immigration: A consistently significant theme in ABC News coverage
Other Notable Themes: Government operations, crisis reporting, and economic issues
4.2 Pre-Election Theme Analysis
Code
# Create visualization for themes 3 months before each election# Define a professional color palettepalette = plt.cm.viridis(np.linspace(0, 0.9, 10))# Create subplot grid with adjusted layoutfig, axes = plt.subplots(len(election_dates), 1, figsize=(14, 5*len(election_dates)))fig.subplots_adjust(hspace=0.5)# Handle single-election caseiflen(election_dates) ==1: axes = [axes]# For each election, get the most common themes in the 3 months beforefor i, (election, date) inenumerate(election_dates.items()): pre_start = date - pd.DateOffset(months=3) pre_end = date - pd.DateOffset(days=1)# Get themes for this time period election_window = (df["parsed_date"] >= pre_start) & (df["parsed_date"] <= pre_end) pre_election_themes = df.loc[election_window, "V2Themes"].dropna().str.split(";")# Extract and count themes theme_counts = [theme.split(",")[0] for sublist in pre_election_themes for theme in sublist if theme] top_themes = Counter(theme_counts).most_common(10)# Map to friendly names friendly_themes = [(theme_name_mapping.get(theme, theme), count) for theme, count in top_themes]# Create DataFrame for this election theme_df = pd.DataFrame(friendly_themes, columns=['Theme', 'Count']) theme_df = theme_df.sort_values('Count')# Plot horizontal bar chart ax = axes[i] bars = ax.barh(theme_df['Theme'], theme_df['Count'], color=palette, height=0.7)# Add count labelsfor bar in bars: width = bar.get_width() ax.text(width + (width *0.01), bar.get_y() + bar.get_height()/2, f'{width:,.0f}', ha='left', va='center', fontsize=10, fontweight='bold', color='dimgrey')# Set titles and labels ax.set_title(f"Top Media Themes: 3 Months Before {election}", fontsize=16, fontweight='bold', pad=20) ax.set_xlabel('Frequency', fontsize=12) ax.set_ylabel('') ax.invert_yaxis()# Improve styling ax.grid(axis='x', linestyle='--', alpha=0.7, color='lightgrey') ax.spines['right'].set_visible(False) ax.spines['top'].set_visible(False)# Annotate the date range date_range =f"({pre_start.strftime('%b %d, %Y')} - {pre_end.strftime('%b %d, %Y')})" ax.text(0.5, 1.05, date_range, transform=ax.transAxes, ha='center', fontsize=12, fontstyle='italic', color='grey')plt.suptitle("Pre-Election Media Focus: ABC News Themes Before Each Election", fontsize=20, y=1.02, fontweight='bold')plt.tight_layout()plt.show()
Key Observations: - Presidential themes dominate coverage in presidential election years - Immigration appears consistently across multiple election cycles - Some themes are election-specific (e.g., the prominence of healthcare in certain cycles)
4.3 Post-Election Theme Analysis (3 Months)
Code
# Create visualization for themes 3 months after each electionfig, axes = plt.subplots(len(election_dates), 1, figsize=(14, 5*len(election_dates)))fig.subplots_adjust(hspace=0.5)# Handle single-election caseiflen(election_dates) ==1: axes = [axes]# For each election, get the most common themes in the 3 months afterfor i, (election, date) inenumerate(election_dates.items()): post_start = date + pd.DateOffset(days=1) post_end = date + pd.DateOffset(months=3)# Get themes for this time period election_window = (df["parsed_date"] >= post_start) & (df["parsed_date"] <= post_end) post_election_themes = df.loc[election_window, "V2Themes"].dropna().str.split(";")# Extract and count themes theme_counts = [theme.split(",")[0] for sublist in post_election_themes for theme in sublist if theme] top_themes = Counter(theme_counts).most_common(10)# Map to friendly names friendly_themes = [(theme_name_mapping.get(theme, theme), count) for theme, count in top_themes]# Create DataFrame for this election theme_df = pd.DataFrame(friendly_themes, columns=['Theme', 'Count']) theme_df = theme_df.sort_values('Count')# Plot horizontal bar chart ax = axes[i] bars = ax.barh(theme_df['Theme'], theme_df['Count'], color=palette, height=0.7)# Add count labelsfor bar in bars: width = bar.get_width() ax.text(width + (width *0.01), bar.get_y() + bar.get_height()/2, f'{width:,.0f}', ha='left', va='center', fontsize=10, fontweight='bold', color='dimgrey')# Set titles and labels ax.set_title(f"Top Media Themes: 3 Months After {election}", fontsize=16, fontweight='bold', pad=20) ax.set_xlabel('Frequency', fontsize=12) ax.set_ylabel('') ax.invert_yaxis()# Improve styling ax.grid(axis='x', linestyle='--', alpha=0.7, color='lightgrey') ax.spines['right'].set_visible(False) ax.spines['top'].set_visible(False)# Annotate the date range date_range =f"({post_start.strftime('%b %d, %Y')} - {post_end.strftime('%b %d, %Y')})" ax.text(0.5, 1.05, date_range, transform=ax.transAxes, ha='center', fontsize=12, fontstyle='italic', color='grey')plt.suptitle("Post-Election Media Focus: ABC News Themes After Each Election", fontsize=20, y=1.02, fontweight='bold')plt.tight_layout()plt.show()
Post-Election Media Focus: - The President/Presidential themes often remain dominant immediately after elections - Government administration themes become more prominent in the post-election period - Some campaign-related themes decrease in prominence
4.4 Extended Post-Election Coverage (6 Months)
Code
# Create visualization for themes 6 months after each electionfig, axes = plt.subplots(len(election_dates), 1, figsize=(14, 5*len(election_dates)))fig.subplots_adjust(hspace=0.5)# Handle single-election caseiflen(election_dates) ==1: axes = [axes]# For each election, get the most common themes in the 6 months afterfor i, (election, date) inenumerate(election_dates.items()): post_start = date + pd.DateOffset(days=1) post_end = date + pd.DateOffset(months=6)# Get themes for this time period election_window = (df["parsed_date"] >= post_start) & (df["parsed_date"] <= post_end) post_election_themes = df.loc[election_window, "V2Themes"].dropna().str.split(";")# Extract and count themes theme_counts = [theme.split(",")[0] for sublist in post_election_themes for theme in sublist if theme] top_themes = Counter(theme_counts).most_common(10)# Map to friendly names friendly_themes = [(theme_name_mapping.get(theme, theme), count) for theme, count in top_themes]# Create DataFrame for this election theme_df = pd.DataFrame(friendly_themes, columns=['Theme', 'Count']) theme_df = theme_df.sort_values('Count')# Plot horizontal bar chart ax = axes[i] bars = ax.barh(theme_df['Theme'], theme_df['Count'], color=palette, height=0.7)# Add count labelsfor bar in bars: width = bar.get_width() ax.text(width + (width *0.01), bar.get_y() + bar.get_height()/2, f'{width:,.0f}', ha='left', va='center', fontsize=10, fontweight='bold', color='dimgrey')# Set titles and labels ax.set_title(f"Top Media Themes: 6 Months After {election}", fontsize=16, fontweight='bold', pad=20) ax.set_xlabel('Frequency', fontsize=12) ax.set_ylabel('') ax.invert_yaxis()# Improve styling ax.grid(axis='x', linestyle='--', alpha=0.7, color='lightgrey') ax.spines['right'].set_visible(False) ax.spines['top'].set_visible(False)# Annotate the date range date_range =f"({post_start.strftime('%b %d, %Y')} - {post_end.strftime('%b %d, %Y')})" ax.text(0.5, 1.05, date_range, transform=ax.transAxes, ha='center', fontsize=12, fontstyle='italic', color='grey')plt.suptitle("Extended Post-Election Coverage: 6-Month ABC News Themes", fontsize=20, y=1.02, fontweight='bold')plt.tight_layout()plt.show()
Extended Coverage Patterns: - Over a 6-month post-election period, coverage shows a broader range of themes - Governance and policy themes become more prominent compared to immediate post-election coverage - Emerging issues often rise in prominence, diluting election-specific themes
4.5 Theme Shifts Before vs. After Elections
Code
# Function to get theme counts in a specific date rangedef get_theme_counts(start_date, end_date): mask = (df["parsed_date"] >= start_date) & (df["parsed_date"] <= end_date) themes_series = df.loc[mask, "V2Themes"].dropna().str.split(";") all_themes = [theme.split(",")[0] for sublist in themes_series for theme in sublist if theme]return Counter(all_themes)# Analyze themes before and after each electiontheme_shift_analysis = {}theme_shift_data = [] # Create a list to store data for the DataFramefor election, date in election_dates.items(): pre_start = date - pd.DateOffset(months=3) pre_end = date - pd.DateOffset(days=1) post_start = date + pd.DateOffset(days=1) post_end = date + pd.DateOffset(months=3) pre_counts = get_theme_counts(pre_start, pre_end) post_counts = get_theme_counts(post_start, post_end)# Calculate the difference in theme frequencies theme_diff = {theme: post_counts[theme] - pre_counts.get(theme, 0) for theme in post_counts}# Sort themes by the magnitude of change sorted_theme_diff =sorted(theme_diff.items(), key=lambda item: abs(item[1]), reverse=True)# Store top 10 themes with the most change theme_shift_analysis[election] = sorted_theme_diff[:10]# Add to the data list for DataFramefor theme, shift in sorted_theme_diff[:10]: theme_shift_data.append({"Election": election,"Theme": theme,"Tone Shift": shift })# Create theme_df from the collected datatheme_df = pd.DataFrame(theme_shift_data)# Apply theme name mappingtheme_df["Theme"] = theme_df["Theme"].map(lambda x: theme_name_mapping.get(x, x))
4.5.1 Direct Theme Comparison Visualizations
Code
# Create a visualization comparing top themes before and after each electionfor election, date in election_dates.items():# Define time periods pre_start = date - pd.DateOffset(months=3) pre_end = date - pd.DateOffset(days=1) post_start = date + pd.DateOffset(days=1) post_end = date + pd.DateOffset(months=3)# Get pre-election themes pre_window = (df["parsed_date"] >= pre_start) & (df["parsed_date"] <= pre_end) pre_themes = df.loc[pre_window, "V2Themes"].dropna().str.split(";") pre_counts = [theme.split(",")[0] for sublist in pre_themes for theme in sublist if theme] pre_top =dict(Counter(pre_counts).most_common(15))# Get post-election themes post_window = (df["parsed_date"] >= post_start) & (df["parsed_date"] <= post_end) post_themes = df.loc[post_window, "V2Themes"].dropna().str.split(";") post_counts = [theme.split(",")[0] for sublist in post_themes for theme in sublist if theme] post_top =dict(Counter(post_counts).most_common(15))# Get all unique themes all_themes =set(pre_top.keys()) |set(post_top.keys())# Create dataframe with both periods comparison_data = []for theme in all_themes: friendly_name = theme_name_mapping.get(theme, theme) comparison_data.append({'Theme': friendly_name,'Pre-Election': pre_top.get(theme, 0),'Post-Election': post_top.get(theme, 0),'Difference': post_top.get(theme, 0) - pre_top.get(theme, 0) })# Create DataFrame and sort by absolute difference comp_df = pd.DataFrame(comparison_data) comp_df = comp_df.sort_values('Difference', key=abs, ascending=False).head(12)# Calculate percentages for better comparison total_pre =sum(pre_top.values()) total_post =sum(post_top.values()) comp_df['Pre %'] = comp_df['Pre-Election'] / total_pre *100 comp_df['Post %'] = comp_df['Post-Election'] / total_post *100 comp_df['% Change'] = comp_df['Post %'] - comp_df['Pre %']# Create figure with multiple subplots fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 10), gridspec_kw={'width_ratios': [3, 1]})# Plot 1: Side-by-side bar chart of counts comp_df = comp_df.sort_values('Theme') # Sort alphabetically for this chart x = np.arange(len(comp_df)) width =0.35# Plot bars pre_bars = ax1.barh(x - width/2, comp_df['Pre-Election'], width, label='Pre-Election', color='#3274A1', alpha=0.8) post_bars = ax1.barh(x + width/2, comp_df['Post-Election'], width, label='Post-Election', color='#E1812C', alpha=0.8)# Add labels and styling ax1.set_yticks(x) ax1.set_yticklabels(comp_df['Theme']) ax1.invert_yaxis() ax1.legend(loc='upper right') ax1.set_title(f'Theme Frequency Comparison for {election}', fontsize=16, fontweight='bold') ax1.set_xlabel('Count', fontsize=12)# Add count labelsfor bars, counts in [(pre_bars, comp_df['Pre-Election']), (post_bars, comp_df['Post-Election'])]:for bar, count inzip(bars, counts):if count >0: ax1.text(count +50, bar.get_y() + bar.get_height()/2, f'{count:,.0f}', ha='left', va='center', fontsize=9)# Plot 2: Net change (waterfall chart alternative) comp_df = comp_df.sort_values('Difference') # Sort by difference for this chart colors = ['#E15759'if x <0else'#4E79A7'for x in comp_df['Difference']]# Plot the differences diff_bars = ax2.barh(comp_df['Theme'], comp_df['Difference'], color=colors)# Add a vertical line at zero ax2.axvline(x=0, color='black', linestyle='-', alpha=0.3)# Add labelsfor bar in diff_bars: width = bar.get_width() label_x_pos = width + np.sign(width) *50if width >0: ha ='left'else: ha ='right' ax2.text(label_x_pos, bar.get_y() + bar.get_height()/2, f'{width:+,.0f}', ha=ha, va='center', fontsize=9) ax2.set_title('Net Change in Theme Frequency', fontsize=16, fontweight='bold') ax2.set_xlabel('Difference (Post - Pre)', fontsize=12) ax2.set_yticklabels([]) # Hide y-labels as they're in the first plot# Add overall title and subtitles plt.suptitle(f'Media Focus Shift: Before vs. After {election}', fontsize=20, fontweight='bold', y=0.98) pre_range =f"Pre: {pre_start.strftime('%b %d, %Y')} - {pre_end.strftime('%b %d, %Y')}" post_range =f"Post: {post_start.strftime('%b %d, %Y')} - {post_end.strftime('%b %d, %Y')}" fig.text(0.5, 0.91, f"{pre_range} | {post_range}", ha='center', fontsize=12, fontstyle='italic')# Add explanatory notes fig.text(0.5, 0.03, "Note: Blue bars in the right panel indicate themes that gained prominence after the election, while red bars show declining themes.", ha='center', fontsize=10, fontstyle='italic') plt.tight_layout() plt.subplots_adjust(top=0.88) plt.show()
Key Insights: - Each election shows distinctive shifts in thematic focus - Some themes consistently gain prominence after elections (e.g., Presidential coverage) - Campaign-specific themes often decline after elections - The right panel clearly indicates which themes gain (blue) or lose (red) prominence
4.6 Theme-Specific Tone Analysis
Code
# Create a heatmap visualization of theme tone shifts across elections# Create pivot table for heatmappivot_df = theme_df.pivot(index="Theme", columns="Election", values="Tone Shift").fillna(0)# Overall heatmapplt.figure(figsize=(12, 10))sns.heatmap(pivot_df, cmap="RdBu_r", center=0, annot=True, fmt=".0f", linewidths=0.5)plt.title("Theme Frequency Shifts Across Elections", fontsize=16, fontweight='bold')plt.ylabel("Theme", fontsize=12)plt.xlabel("Election", fontsize=12)plt.tight_layout()plt.show()# Individual election heatmaps for clearer detailunique_elections = theme_df["Election"].unique()for election in unique_elections:# Filter for this election and create a pivot table election_df = theme_df[theme_df["Election"] == election] single_df = election_df.pivot(index="Theme", columns="Election", values="Tone Shift").fillna(0) plt.figure(figsize=(8, 10)) sns.heatmap(single_df, cmap="RdBu_r", center=0, annot=True, fmt=".0f", linewidths=0.5) plt.title(f"Theme Frequency Shifts – {election}", fontsize=16, fontweight='bold') plt.xlabel("Election") plt.ylabel("Theme") plt.tight_layout() plt.show()
Understanding the Heatmap:
Rows (Y-axis): Each theme extracted from ABC News coverage
Columns (X-axis): Different election cycles
Colors:
Red = Increased theme frequency after the election
Blue = Decreased theme frequency after the election
White = No significant change
Numbers: The raw count difference between post-election and pre-election periods
4.7 Theme Evolution Timeline
Code
# Create a timeline visualization showing how key themes evolved across all elections# Select important themes to track over timekey_themes = ['Immigration', 'General Politics']theme_codes = {v: k for k, v in theme_name_mapping.items() if v in key_themes}theme_codes.update({k: k for k in key_themes if k notin theme_name_mapping.values()})# Get monthly data for these themesmonthly_data = []# Convert min and max years to integers explicitlymin_year =int(df['parsed_date'].dt.year.min())max_year =int(df['parsed_date'].dt.year.max() +1)# Create timeline with monthly data pointsfor year inrange(min_year, max_year):for month inrange(1, 13): start_date = pd.Timestamp(f"{year}-{month:02d}-01")if month ==12: end_date = pd.Timestamp(f"{year+1}-01-01") - pd.Timedelta(days=1)else: end_date = pd.Timestamp(f"{year}-{month+1:02d}-01") - pd.Timedelta(days=1)# Skip dates outside our datasetif start_date < df['parsed_date'].min() or start_date > df['parsed_date'].max():continue# Get themes for this month mask = (df["parsed_date"] >= start_date) & (df["parsed_date"] <= end_date)if df.loc[mask].shape[0] ==0: # Skip months with no datacontinue month_themes = df.loc[mask, "V2Themes"].dropna().str.split(";") all_month_themes = [theme.split(",")[0] for sublist in month_themes for theme in sublist if theme] theme_counter = Counter(all_month_themes)# Get counts for our key themesfor display_name, code in theme_codes.items(): monthly_data.append({'date': start_date,'theme': display_name,'count': theme_counter.get(code, 0) })# Convert to DataFrametimeline_df = pd.DataFrame(monthly_data)# Normalize by total monthly theme counts to get percentagemonthly_totals = timeline_df.groupby('date')['count'].sum().reset_index()monthly_totals.columns = ['date', 'total']timeline_df = timeline_df.merge(monthly_totals, on='date')timeline_df['percentage'] = (timeline_df['count'] / timeline_df['total'] *100).round(2)# Plot the theme timelineplt.figure(figsize=(20, 10))# Get unique themes and assign colorsunique_themes = timeline_df['theme'].unique()colors = plt.cm.Dark2(np.linspace(0, 1, len(unique_themes)))theme_colors =dict(zip(unique_themes, colors))# Create separate trend line for each themefor theme in unique_themes: theme_data = timeline_df[timeline_df['theme'] == theme] plt.plot(theme_data['date'], theme_data['percentage'], label=theme, linewidth=2.5, color=theme_colors[theme], marker='o', markersize=3)# First create the plot so the y-axis limits are establishedplt.xlabel('Date', fontsize=14)plt.ylabel('Percentage of Monthly Coverage', fontsize=14)plt.title('Evolution of Key Media Themes Over Time', fontsize=20, fontweight='bold')plt.grid(True, alpha=0.3)# Get y-axis limits *after* the plot is createdy_lim = plt.gca().get_ylim()# Add election markers with fixed y-positionfor election, date in election_dates.items(): plt.axvline(x=date, color='black', linestyle='--', alpha=0.5)# Calculate y position based on current y-axis limits y_pos = y_lim[1] *0.95 plt.text(date, y_pos, election, rotation=90, ha='right', fontsize=10)plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), ncol=len(unique_themes), fontsize=12, frameon=True)# Format x-axis date labelsplt.gcf().autofmt_xdate()plt.tight_layout()plt.show()
Longitudinal Theme Analysis:
This visualization tracks key themes as a percentage of total coverage over time, revealing:
How media focus evolves before, during, and after election periods
Seasonal patterns in thematic coverage
Long-term trends in media priorities
The relationship between certain themes and specific elections
import matplotlib.pyplot as pltimport seaborn as snsplt.style.use('seaborn-v0_8-whitegrid')plt.rcParams['font.family'] ='sans-serif'plt.figure(figsize=(12, 6))sns.boxplot(x='media', y='tone', data=combined_df, palette=['#1f77b4', '#ff7f0e', '#2ca02c'])plt.title('Distribution of Tone Scores Across News Networks', fontsize=16, fontweight='bold')plt.xlabel('News Network', fontsize=12)plt.ylabel('Tone Score', fontsize=12)plt.grid(True, alpha=0.3)plt.tight_layout()plt.show()combined_df['year_month'] = combined_df['parsed_date'].dt.to_period('M')media_tone_trend = combined_df.groupby(['year_month', 'media'])['tone'].mean().reset_index()media_tone_trend['year_month'] = pd.to_datetime(media_tone_trend['year_month'].astype(str))plt.figure(figsize=(16, 8))for media, color inzip(['abc', 'msnbc', 'fox'], ['#1f77b4', '#ff7f0e', '#2ca02c']): media_data = media_tone_trend[media_tone_trend['media'] == media] plt.plot(media_data['year_month'], media_data['tone'], label=media.upper(), color=color, linewidth=2) media_data['rolling'] = media_data['tone'].rolling(window=3, center=True).mean() plt.plot(media_data['year_month'], media_data['rolling'], color=color, linestyle='--', alpha=0.7)for label, date in event_dates.items(): plt.axvline(date, color='gray', linestyle='--', alpha=0.5) plt.text(date, media_tone_trend['tone'].min()+0.5, label, rotation=90, va='bottom', fontsize=10)plt.title('Tone Trend Comparison Across News Networks', fontsize=16, fontweight='bold')plt.xlabel('Year', fontsize=12)plt.ylabel('Average Tone Score', fontsize=12)plt.legend(title='News Network')plt.grid(True, alpha=0.3)plt.tight_layout()plt.show()
C:\Users\Michael\AppData\Local\Temp\ipykernel_2388\3103540079.py:8: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
C:\Users\Michael\AppData\Local\Temp\ipykernel_2388\3103540079.py:26: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
C:\Users\Michael\AppData\Local\Temp\ipykernel_2388\3103540079.py:26: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
C:\Users\Michael\AppData\Local\Temp\ipykernel_2388\3103540079.py:26: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
6 Analysis of differences in subject coverage
6.1 Comparison of cross-media topic distribution
Code
def preprocess_themes(df): themes = df['V2Themes'].dropna().str.split(';')return [theme.split(',')[0] for sublist in themes for theme in sublist if theme]media_themes = {}for media in ['abc', 'msnbc', 'fox']: media_df = combined_df[combined_df['media'] == media] themes = preprocess_themes(media_df) media_themes[media] = Counter(themes).most_common(20)theme_comparison = []for media, themes in media_themes.items():for theme, count in themes: theme_comparison.append({'media': media,'theme': theme_name_mapping.get(theme, theme),'count': count })theme_comparison_df = pd.DataFrame(theme_comparison)common_themes =set()for media in ['abc', 'msnbc', 'fox']: themes =set([t[0] for t in media_themes[media]])ifnot common_themes: common_themes = themeselse: common_themes &= themesprint(f"Top {len(common_themes)} topics focused by 3 medias:")for theme in common_themes:print(f"- {theme_name_mapping.get(theme, theme)}")
Top 11 topics focused by 3 medias:
- USPEC_POLICY1
- GENERAL_HEALTH
- Government
- Presidents
- TRIAL
- Conflict & Fragility
- Safety
- Crisis Reporting
- WB_696_PUBLIC_SECTOR_MANAGEMENT
- General Politics
- Leaders
6.2 Topic comparison visualization
Code
common_theme_counts = theme_comparison_df[theme_comparison_df['theme'].isin( [theme_name_mapping.get(t, t) for t in common_themes])]plt.figure(figsize=(14, 8))sns.barplot(x='count', y='theme', hue='media', data=common_theme_counts, palette=['#1f77b4', '#ff7f0e', '#2ca02c'])plt.title('Coverage of Common Themes Across Networks', fontsize=16, fontweight='bold')plt.xlabel('Frequency', fontsize=12)plt.ylabel('Theme', fontsize=12)plt.legend(title='News Network')plt.tight_layout()plt.show()unique_themes = {}for media in ['abc', 'msnbc', 'fox']: other_media =set(['abc', 'msnbc', 'fox']) - {media} media_themes_set =set([t[0] for t in media_themes[media]])for other in other_media: media_themes_set -=set([t[0] for t in media_themes[other]]) unique_themes[media] = media_themes_setfor media, themes in unique_themes.items():if themes:print(f"\n{media.upper()}Unique topic:")for theme in themes:print(f"- {theme_name_mapping.get(theme, theme)}")else:print(f"\n{media.upper()}Has no unique topic")
7 Differences in media behavior during election cycles
7.1 Comparison of changes in tone before and after the election
Code
election_shift_results = []for media in ['abc', 'msnbc', 'fox']: media_df = combined_df[combined_df['media'] == media]for label, date in election_dates.items(): pre = media_df[(media_df['parsed_date'] >= date - pd.DateOffset(months=3)) & (media_df['parsed_date'] < date)] post = media_df[(media_df['parsed_date'] >= date) & (media_df['parsed_date'] < date + pd.DateOffset(months=3))] election_shift_results.append({'media': media,'election': label,'pre_avg_tone': pre['tone'].mean(),'post_avg_tone': post['tone'].mean(),'tone_shift': post['tone'].mean() - pre['tone'].mean(),'pre_articles': len(pre),'post_articles': len(post) })election_shift_df = pd.DataFrame(election_shift_results)plt.figure(figsize=(14, 8))sns.barplot(x='election', y='tone_shift', hue='media', data=election_shift_df, palette=['#1f77b4', '#ff7f0e', '#2ca02c'])plt.axhline(0, color='black', linewidth=0.5)plt.title('Tone Shift Before/After Elections by News Network', fontsize=16, fontweight='bold')plt.xlabel('Election', fontsize=12)plt.ylabel('Tone Shift (Post - Pre)', fontsize=12)plt.legend(title='News Network')plt.grid(axis='y', alpha=0.3)plt.tight_layout()plt.show()
7.2 Differences in topic preferences in election coverage
Code
election_theme_results = []for media in ['abc', 'msnbc', 'fox']: media_df = combined_df[combined_df['media'] == media]for label, date in election_dates.items(): period_df = media_df[(media_df['parsed_date'] >= date - pd.DateOffset(months=3)) & (media_df['parsed_date'] <= date + pd.DateOffset(months=3))] themes = preprocess_themes(period_df)for theme, count in Counter(themes).most_common(10): election_theme_results.append({'media': media,'election': label,'theme': theme_name_mapping.get(theme, theme),'count': count })election_theme_df = pd.DataFrame(election_theme_results)election_2020 = election_theme_df[election_theme_df['election'] =='2020 Presidential']plt.figure(figsize=(14, 10))sns.barplot(x='count', y='theme', hue='media', data=election_2020, palette=['#1f77b4', '#ff7f0e', '#2ca02c'])plt.title('Theme Coverage During 2020 Election by Network', fontsize=16, fontweight='bold')plt.xlabel('Frequency', fontsize=12)plt.ylabel('Theme', fontsize=12)plt.legend(title='News Network')plt.tight_layout()plt.show()
8 Advanced EDA Analysis
8.1 Time series decomposition analysis
Code
from statsmodels.tsa.seasonal import seasonal_decomposefor media in ['abc', 'msnbc', 'fox']: media_tone = combined_df[combined_df['media'] == media].set_index('parsed_date')['tone'] monthly_tone = media_tone.resample('M').mean().dropna()# seasonal decomposition = seasonal_decompose(monthly_tone, model='additive', period=12) # annual seasonal plt.figure(figsize=(14, 10)) decomposition.plot() plt.suptitle(f'Time Series Decomposition of {media.upper()} Tone', y=1.02, fontsize=16, fontweight='bold') plt.tight_layout() plt.show()
C:\Users\Michael\AppData\Local\Temp\ipykernel_2388\3352210059.py:5: FutureWarning:
'M' is deprecated and will be removed in a future version, please use 'ME' instead.
<Figure size 1344x960 with 0 Axes>
C:\Users\Michael\AppData\Local\Temp\ipykernel_2388\3352210059.py:5: FutureWarning:
'M' is deprecated and will be removed in a future version, please use 'ME' instead.
<Figure size 1344x960 with 0 Axes>
C:\Users\Michael\AppData\Local\Temp\ipykernel_2388\3352210059.py:5: FutureWarning:
'M' is deprecated and will be removed in a future version, please use 'ME' instead.
<Figure size 1344x960 with 0 Axes>
9 Network analysis visualization
9.1 Topic co-occurrence network
Code
import networkx as nxfrom itertools import combinationsdef build_cooccurrence_network(df, top_n_themes=30): themes = preprocess_themes(df) top_themes = [t for t, _ in Counter(themes).most_common(top_n_themes)] cooccur = {}for themes_list in df['V2Themes'].dropna().str.split(';'): themes_in_article = [t.split(',')[0] for t in themes_list if t] themes_in_article = [t for t in themes_in_article if t in top_themes]for pair in combinations(set(themes_in_article), 2): sorted_pair =tuple(sorted(pair)) cooccur[sorted_pair] = cooccur.get(sorted_pair, 0) +1 G = nx.Graph()for (t1, t2), weight in cooccur.items(): G.add_edge(theme_name_mapping.get(t1, t1), theme_name_mapping.get(t2, t2), weight=weight)return Gmedia_graphs = {}for media in ['abc', 'msnbc', 'fox']: media_df = combined_df[combined_df['media'] == media] media_graphs[media] = build_cooccurrence_network(media_df)plt.figure(figsize=(14, 12))G = media_graphs['abc']pos = nx.spring_layout(G, k=0.3, iterations=50)weights = [G[u][v]['weight']/10for u,v in G.edges()]nx.draw_networkx(G, pos, with_labels=True, node_size=800, node_color='skyblue', font_size=10, width=weights, edge_color='gray')plt.title('ABC News Theme Co-occurrence Network', fontsize=16, fontweight='bold')plt.axis('off')plt.tight_layout()plt.show()
10 Conclusion
10.1 Key Findings
Our analysis of ABC News coverage across five election cycles reveals several significant patterns:
Tone Shifts: All five elections showed a positive tone shift in the post-election period compared to pre-election coverage.
Thematic Evolution: Election coverage transitions from campaign-focused themes before elections to governance and policy themes afterward.
Consistent Themes: Presidential leadership, immigration, and general government operations persist as dominant themes across all periods.
Temporal Patterns: Media tone shows clear cyclical patterns aligned with election cycles, suggesting electoral politics significantly influences news sentiment.
10.2 Methodological Notes
GDELT’s tone scores range from -10 (extremely negative) to +10 (extremely positive)
Most news content clusters between -5 and +1, with ABC News averaging around -2.7